Partitioning
When working with very large datasets, queries can become slow and storage can be inefficient.
Partitioning is a MySQL technique that splits a large table into smaller, more manageable pieces (called partitions) but still treats them as a single logical table.
- Partitioning = dividing a table’s rows into separate physical storage chunks.
- Queries can target only the relevant partition instead of scanning the entire table.
- MySQL decides automatically which partition to search based on the query condition.
This improves query performance, manageability, and maintenance.
Benefits of Partitioning
- Faster queries (partition pruning → scans only relevant partitions).
- Better performance with huge tables (billions of rows).
- Easier maintenance (you can archive or drop a partition without touching the rest).
- Efficient storage (old data can be separated).
Limitations of Partitioning
- Partition key must be part of PRIMARY KEY or UNIQUE KEY.
- No foreign keys allowed in partitioned tables.
- Too many partitions can slow down queries.
- Not all storage engines support it (must be InnoDB in modern MySQL).
Types of Partitioning
- RANGE → ranges of values (time-based data).
- LIST → specific values (categories).
- HASH → evenly distribute with a hash function.
- KEY → system-managed hash (usually on primary keys).
RANGE Partitioning
Rows are placed into partitions based on ranges of values.
- Good for time-based data (e.g., monthly logs).
CREATE TABLE sales (
id INT,
sale_date DATE,
amount DECIMAL(10,2)
)
PARTITION BY RANGE (YEAR(sale_date)) (
PARTITION p2019 VALUES LESS THAN (2020),
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
Data from 2019 goes into p2019, 2020 into p2020, and so on.
Query:
SELECT * FROM sales WHERE sale_date BETWEEN '2020-01-01' AND '2020-12-31';
MySQL only scans p2020, not the whole table.
LIST Partitioning
Rows are partitioned based on a list of specific values.
- Good for categorical data (e.g., regions, departments).
CREATE TABLE employees (
id INT,
name VARCHAR(50),
region VARCHAR(10)
)
PARTITION BY LIST COLUMNS (region) (
PARTITION p_north VALUES IN ('North'),
PARTITION p_south VALUES IN ('South'),
PARTITION p_east VALUES IN ('East'),
PARTITION p_west VALUES IN ('West')
);
HASH Partitioning
Rows are distributed into partitions based on a hash function.
- Good for evenly distributing data when values are unpredictable.
CREATE TABLE orders (
order_id INT,
customer_id INT,
amount DECIMAL(10,2)
)
PARTITION BY HASH(customer_id) PARTITIONS 4;
Rows are distributed across 4 partitions using a hash of customer_id. Useful for load balancing queries across partitions.
KEY Partitioning
Similar to HASH, but MySQL chooses the hash function internally.
- Typically used with primary keys.
CREATE TABLE products (
product_id INT PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2)
)
PARTITION BY KEY(product_id) PARTITIONS 3;
MySQL automatically hashes product_id into 3 partitions.
Practical Example of Partitioning
Imagine a website access log table with millions of rows per year:
CREATE TABLE access_logs (
id INT AUTO_INCREMENT PRIMARY KEY,
log_date DATE NOT NULL,
user_id INT,
action VARCHAR(100)
)
PARTITION BY RANGE (YEAR(log_date)*100 + MONTH(log_date)) (
PARTITION p202201 VALUES LESS THAN (202202),
PARTITION p202202 VALUES LESS THAN (202203),
PARTITION p202203 VALUES LESS THAN (202204),
PARTITION pmax VALUES LESS THAN MAXVALUE
);
- January 2022 logs →
p202201 - February 2022 logs →
p202202
Query example:
SELECT * FROM access_logs WHERE log_date BETWEEN '2022-02-01' AND '2022-02-28';
Only scans p202202, making it much faster.